A Condensation Approach to Privacy Preserving Data Mining
نویسندگان
چکیده
In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy preserving data mining of multi-dimensional data. Previous work for privacy preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution based algorithm for each data mining problem, since it does not use the multi-dimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy preserving data mining which does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. This anonymized data closely matches the characteristics of the original data including the correlations among the different dimensions. We present empirical results illustrating the effectiveness of the method.
منابع مشابه
On Variable Constraints in Privacy Preserving Data Mining
In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. A recent framework performs privacy preserving data mining by using a condensation based approach....
متن کاملPrivacy Preserving Data Mining
There is a tremendous increase in the research of data mining. Data mining is the process of extraction of data from large database. Knowledge Discovery in database (KDD) is another name of data mining. Privacy protection has become a necessary requirement in many data mining applications due to emerging privacy legislation and regulations. One of the most important topics in research community...
متن کاملOptimizing Privacy-Accuracy Tradeoff for Privacy Preserving Distance-Based Classification
Privacy concerns often prevent organizations from sharing data for data mining purposes. There has been a rich literature on privacy preserving data mining techniques that can protect privacy and still allow accurate mining. Many such techniques have some parameters that need to be set correctly to achieve the desired balance between privacy protection and quality of mining results. However, th...
متن کاملA centralized privacy-preserving framework for online social networks
There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...
متن کاملOn Anonymization of String Data
String data is especially important in the privacy preserving data mining domain because most DNA and biological data is coded as strings. In this paper, we will discuss a new method for privacy preserving mining of string data with the use of simple template based condensation models. The template based model turns out to be effective in practice, and preserves important statistical characteri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004